Web document classification using topic modeling based document ranking

نویسندگان

چکیده

In this paper, we propose a web document ranking method using topic modeling for effective information collection and classification. The proposed is applied to the technique avoid duplicated crawling when at high speed. Through technique, it feasible remove redundant documents, classify documents efficiently, confirm that crawler service running. enables rapid of many documents; user can search pages with constant data update efficiently. addition, efficiency retrieval be improved because new automatically classified transmitted. By expanding scope big based improving application various websites, expected more will possible.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

RRLUFF: Ranking function based on Reinforcement Learning using User Feedback and Web Document Features

Principal aim of a search engine is to provide the sorted results according to user’s requirements. To achieve this aim, it employs ranking methods to rank the web documents based on their significance and relevance to user query. The novelty of this paper is to provide user feedback-based ranking algorithm using reinforcement learning. The proposed algorithm is called RRLUFF, in which the rank...

متن کامل

Document ranking using web evidence

Evidence based on web graph structure is reportedly used by the current generation of World-Wide Web (WWW) search engines to identify “high-quality”, “important” pages and to reject “spam” content. However, despite the apparent wide use of this evidence its application in web-based document retrieval is controversial. Confusion exists as to how to incorporate web evidence in document ranking, a...

متن کامل

An Ensemble Click Model for Web Document Ranking

Annually, web search engine providers spend more and more money on documents ranking in search engines result pages (SERP). Click models provide advantageous information for ranking documents in SERPs through modeling interactions among users and search engines. Here, three modules are employed to create a hybrid click model; the first module is a PGM-based click model, the second module in a d...

متن کامل

Topic Continuity for Web Document Categorization and Ranking

PageRank is primarily based on link structure analysis. Recently, it has been shown that content information can be utilized to improve link analysis. We propose a novel algorithm that harnesses the information contained in the history of a surfer to determine his topic of interest when he is on a given page. As the history is unavailable until query time, we guess it probabilistically so that ...

متن کامل

Web Document Classification based on Hyperlinks and Document Semantics

Besides the basic content, a web document also contains a set of hyperlinks pointing to other related documents. Hyperlinks in a document provide much information about its relation with other web documents. By analyzing hyperlinks in documents, inter-relationship among documents can be identi ed. In this paper, we will propose an algorithm to classify web documents into subsets based on hyperl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Power Electronics and Drive Systems

سال: 2021

ISSN: ['2722-2578', '2722-256X']

DOI: https://doi.org/10.11591/ijece.v11i3.pp2386-2392